Multiple imputation and maximum likelihood principal component analysis of incomplete multivariate data from a study of the ageing of port
نویسندگان
چکیده
A multivariate data matrix containing a number of missing values was obtained from a study on the changes in colour and phenolic composition during the ageing of port. Two approaches were taken in the analysis of the data. The first inŽ . Ž . volved the use of multiple imputation MI followed by principal components analysis PCA . The second examined the use Ž . of maximum likelihood principal component analysis MLPCA . The use of multiple imputation allows for missing value uncertainty to be incorporated into the analysis of the data. Initial estimates of missing values were firstly calculated using Ž . Ž . the Expectation Maximization algorithm EM , followed by Data Augmentation DA in order to generate five imputed data Ž . matrices. Each complete data matrix was subsequently analysed by PCA, then averaging their principal component PC scores and loadings to give an estimation of errors. The first three PCs accounted for 93.3% of the explained variance. Changes to Ž . colour and monomeric anthocyanin composition were explained on PC1 79.63% explained variance , phenolic composition Ž . and hue mainly on PC2 8.61% explained variance and phenolic composition and the formation of polymeric pigment on Ž . PC3 5.04% explained variance . In MLPCA estimates of measurement uncertainty is incorporated in the decomposition step, with missing values being assigned large measurement uncertainties. PC scores on the first two PCs after multiple imputaŽ . tion and PCA MIqPCA were comparable to maximum likelihood scores on the first two PCs extracted by MLPCA. q 2001 Elsevier Science B.V. All rights reserved.
منابع مشابه
Accuracy evaluation of different statistical and geostatistical censored data imputation approaches (Case study: Sari Gunay gold deposit)
Most of the geochemical datasets include missing data with different portions and this may cause a significant problem in geostatistical modeling or multivariate analysis of the data. Therefore, it is common to impute the missing data in most of geochemical studies. In this study, three approaches called half detection (HD), multiple imputation (MI), and the cosimulation based on Markov model 2...
متن کاملتحلیل درستنمایی ماکزیمم مدل رگرسیون لجستیک در حالتی که داده های متغیرهای پیشگو کامل نیستند ولی متغیرهای کمکی وجود دارند
Background and Objectives: Missing data exist in many studies, e.g. in regression models, and they decrease the model's efficacy. Many methods have been suggested for handling incomplete data: they have generally focused on missing outcome values. But covariate values can also be missing.Materials and Methods: In this paper we study the missing imputation by the EM algorithm and auxiliary varia...
متن کاملLand Cover Classification Using IRS-1D Data and a Decision Tree Classifier
Land cover is one of basic data layers in geographic information system for physical planning and environmentalmonitoring. Digital image classification is generally performed to produce land cover maps from remote sensing data,particularly for large areas. In the present study the multispectral image from IRS LISS-III image along with ancillary datasuch as vegetation indices, principal componen...
متن کاملThe Possibility of Created the Vegetation Cover Maps in the Central Zagros Forest by Using the IRS Satellite Image
The preparation of vegetation cover maps by used the land inventory and a traditional method has a lot of cost and time. But today, remote sensing is one of the main sources of data collection and information production for study and monitoring land resources, and was efficient tools for providing quickly and timely data and information needs for program planning in the natural resource filed. ...
متن کاملThe Possibility of Created the Vegetation Cover Maps in the Central Zagros Forest by Using the IRS Satellite Image
The preparation of vegetation cover maps by used the land inventory and a traditional method has a lot of cost and time. But today, remote sensing is one of the main sources of data collection and information production for study and monitoring land resources, and was efficient tools for providing quickly and timely data and information needs for program planning in the natural resource filed. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001